Models of Translation Competitions
نویسندگان
چکیده
What do we want to learn from a translation competition and how do we learn it with confidence? We argue that a disproportionate focus on ranking competition participants has led to lots of different rankings, but little insight about which rankings we should trust. In response, we provide the first framework that allows an empirical comparison of different analyses of competition results. We then use this framework to compare several analytical models on data from the Workshop on Machine Translation (WMT). 1 The WMT Translation Competition Every year, the Workshop on Machine Translation (WMT) conducts a competition between machine translation systems. The WMT organizers invite research groups to submit translation systems in eight different tracks: Czech to/from English, French to/from English, German to/from English, and Spanish to/from English. For each track, the organizers also assemble a panel of judges, typically machine translation specialists.1 The role of a judge is to repeatedly rank five different translations of the same source text. Ties are permitted. In Table 1, we show an example2 where a judge (we’ll call him “jdoe”) has ranked five translations of the French sentence “Il ne va pas.” Each such elicitation encodes ten pairwise comparisons, as shown in Table 2. For each competition track, WMT typically elicits between 5000 and 20000 comparisons. Once the elicitation process is complete, WMT faces a large database of comparisons and a question that must be answered: whose system is the best? Although in recent competitions, some of the judging has also been crowdsourced (Callison-Burch et al., 2010). The example does not use actual system output. rank system translation 1 bbn “He does not go.” 2 (tie) uedin “He goes not.” 2 (tie) jhu “He did not go.” 4 cmu “He go not.” 5 kit “He not go.” Table 1: WMT elicits preferences by asking judges to simultaneously rank five translations, with ties permitted. In this (fictional) example, the source sentence is the French “Il ne va pas.” source text sys1 sys2 judge preference “Il ne va pas.” bbn cmu jdoe 1 “Il ne va pas.” bbn jhu jdoe 1 “Il ne va pas.” bbn kit jdoe 1 “Il ne va pas.” bbn uedin jdoe 1 “Il ne va pas.” cmu jhu jdoe 2 “Il ne va pas.” cmu kit jdoe 1 “Il ne va pas.” cmu uedin jdoe 2 “Il ne va pas.” jhu kit jdoe 1 “Il ne va pas.” jhu uedin jdoe 0 “Il ne va pas.” kit uedin jdoe 2 Table 2: Pairwise comparisons encoded by Table 1. A preference of 0 means neither translation was preferred. Otherwise the preference specifies the preferred system.
منابع مشابه
Efficient Elicitation of Annotations for Human Evaluation of Machine Translation
A main output of the annual Workshop on Statistical Machine Translation (WMT) is a ranking of the systems that participated in its shared translation tasks, produced by aggregating pairwise sentencelevel comparisons collected from human judges. Over the past few years, there have been a number of tweaks to the aggregation formula in attempts to address issues arising from the inherent ambiguity...
متن کاملBetter Alignments = Better Translations?
Automatic word alignment is a key step in training statistical machine translation systems. Despite much recent work on word alignment methods, alignment accuracy increases often produce little or no improvements in machine translation quality. In this work we analyze a recently proposed agreementconstrained EM algorithm for unsupervised alignment models. We attempt to tease apart the effects t...
متن کاملAn Inquiry into Intersemiotic Translation of Children's Books: A Case Study of Illustration and Rendition
The present study was an attempt to find out whether there were any exclusive strategies regarding illustration in the rendition of children‟s illustrated books. Further, it investigated whether the existing intersemiotic models were sufficiently responsive to the demands of Iranian translators and children. Cases of the study included three children‟s illustrated story books for age groups B ...
متن کاملEdinburgh system description for the 2005 IWSLT speech translation evaluation
Our participation in the IWSLT 2005 speech translation task is our first effort to work on limited domain speech data. We adapted our statistical machine translation system that performed successfully in previous DARPA competitions on open domain text translations. We participated in the supplied corpora transcription track. We achieved the highest BLEU score in 2 out of 5 language pairs and ha...
متن کاملCentralized Supply Chain Network Ddesign: Monopoly, Duopoly, and Ooligopoly Competitions under Uncertainty
This paper presents a competitive supply chain network design problem in which one, two, or three supply chains are planning to enter the price-dependent markets simultaneously in uncertain environments and decide to set the prices and shape their networks. The chains produce competitive products either identical or highly substitutable. Fuzzy multi-level mixed integer programming is used to mo...
متن کاملThe Rise of Modern Persian Literature through Translation in Iran
Translation is an indispensable tool for communication between the diverse linguistic groups. It opens new horizons for the people living in a country so that it makes changes and improvements in their society, especially in the literature. Through the translation process, some literary principles and elements are introduced into the home literature which did not exist before. These features em...
متن کامل